Goto

Collaborating Authors

 self-and semi-supervised learning


VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain

Neural Information Processing Systems

Self-and semi-supervised learning frameworks have made significant progress in training machine learning models with limited labeled data in image and language domains. These methods heavily rely on the unique structure in the domain datasets (such as spatial relationships in images or semantic relationships in language). They are not adaptable to general tabular data which does not have the same explicit structure as image and language data. In this paper, we fill this gap by proposing novel self-and semi-supervised learning frameworks for tabular data, which we refer to collectively as VIME (Value Imputation and Mask Estimation). We create a novel pretext task of estimating mask vectors from corrupted tabular data in addition to the reconstruction pretext task for self-supervised learning. We also introduce a novel tabular data augmentation method for self-and semi-supervised learning frameworks. In experiments, we evaluate the proposed framework in multiple tabular datasets from various application domains, such as genomics and clinical data. VIME exceeds state-of-the-art performance in comparison to the existing baseline methods.


Review for NeurIPS paper: VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain

Neural Information Processing Systems

Weaknesses: My central concern for this paper is the misalignment between the motivation and methodology. As motivation, the authors argue that self-supervised CV and **NLP** "algorithms are not effective for tabular data." The proposed model, though, is effectively the binary masked language model whose variants pervade self-supervised NLP research (e.g. Granted, instead of masking words, the proposed models are masking tabular values, but this is performing a very similar pretext task. In fact, there is concurrent work that learns tabular representations using a BERT model [1].


Review for NeurIPS paper: VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain

Neural Information Processing Systems

This paper proposes a new reconstruction loss for unsupervised training of representations. This loss extends auto-encoders via a pretext task that uses the marginal distribution of features. The reviewers were unanimous in their decision to accept this paper.


VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain

Neural Information Processing Systems

Self- and semi-supervised learning frameworks have made significant progress in training machine learning models with limited labeled data in image and language domains. These methods heavily rely on the unique structure in the domain datasets (such as spatial relationships in images or semantic relationships in language). They are not adaptable to general tabular data which does not have the same explicit structure as image and language data. In this paper, we fill this gap by proposing novel self- and semi-supervised learning frameworks for tabular data, which we refer to collectively as VIME (Value Imputation and Mask Estimation). We create a novel pretext task of estimating mask vectors from corrupted tabular data in addition to the reconstruction pretext task for self-supervised learning.